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ABSTRACT 

Restriction endonucleases are highly specific in 
recognizing the particular DNA sequence they act 
on. However, their activity is affected by sequence 
context, enzyme concentration and buffer compos- 
ition. Changes in these factors may lead to either 
ineffective cleavage at the cognate restriction 
site or relaxed specificity allowing cleavage of 
degenerate 'star' sites. Additionally, uncharacterized 
restriction endonucleases and engineered variants 
present novel activities. Traditionally, restriction 
endonuclease activity is assayed on simple sub- 
strates such as plasmids and synthesized oligo- 
nucleotides. We present and use high-throughput 
lllumina sequencing-based strategies to assay the 
sequence specificity and flanking sequence prefer- 
ence of restriction endonucleases. The techniques 
use fragmented DNA from sequenced genomes to 
quantify restriction endonuclease cleavage on a 
complex genomic DNA substrate in a single 
reaction. By mapping millions of restriction site- 
flanking reads back to the Escherichia coli and 
Drosophila melanogaster genomes we were able to 
quantitatively characterize the cognate and star site 
activity of EcoRI and Mfel and demonstrate genome- 
wide decreases in star activity with engineered high- 
fidelity variants EcoRI-HF and Mfel-HF, as well as 
quantify the influence on Mfel cleavage conferred 
by flanking nucleotides. The methods presented are 
readily applicable to all type II restriction endonucle- 
ases that cleave both strands of double-stranded 
DNA. 



INTRODUCTION 

Type II restriction endonucleases cleave double-stranded 
DNA at a constant position with respect to a short 



(3-8 bp) recognition sequence (1). Their exquisite specifi- 
city has rendered them among the most useful tools in 
molecular biology (1,2). However, the impact of add- 
itional variables such as organic solvent, ion, small 
molecule and enzyme concentrations has large effects on 
the specificity of restriction endonucleases, often leading 
to cleavage at non-cognate sites (termed star activity) 
(3-7). Many commonly used restriction endonucleases 
show some star activity even under standard reaction con- 
ditions (3). The DNA substrate itself can also modulate 
cleavage. It has been noted that nucleotides flanking the 
recognition site confer large contributions to the ener- 
getics of cleavage (8-12). Quantitative analysis of star 
activity and flanking effects will help to elucidate the struc- 
ture-function rules for restriction enzymes, define the 
window of optimal restriction endonuclease specificity as 
well as tailor reaction conditions toward novel target 
sequences. 

t)espite the conserved functionality among the restric- 
tion endonuclease family, these enzymes show great diver- 
gence in both sequence and mechanism (1,9,13,14). Apart 
from isoschizomers, most members show little sequence 
homology to each other or other known proteins (1). 
Additionally, the variable distribution of base-contacting 
residues among the restriction endonucleases has con- 
founded recognition sequence prediction (9,15,16). 
Consequently, restriction endonuclease characterization 
must be carried out empirically for each enzyme. Star 
activity (4,17-21) and flanking preference (8-12) have 
been investigated for several enzymes. These experiments 
have been performed on homogeneous substrates. A series 
of oligonucleotides containing different star or flanking 
sequences are synthesized, annealed, cleaved and 
analyzed one by one, making exhaustive studies difficult. 
Recognition site determination is typically carried out by 
digestion of a homogeneous plasmid or virus DNA sub- 
strate followed by agarose gel visualization of cleavage 
products (6,22-25). This technique is lacking both in its 
substrate complexity and sensitivity. A given cognate or 
star site could occur few times in these substrates, and at 
times, not at all. This limits the ability to accurately 
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quantify activity at different cleavage sites owing to a lack 
of diversity of flanking nucleotides. Star activity is often 
several orders of magnitude lower than cleavage at the 
cognate site (3,17). Consequently a large component of 
star activity will remain cryptic when cleavage products 
must be of sufficient abundance to be visualized on an 
agarose gel. 

The growing amount of prokaryotic genomic sequence 
putatively coding for uncharacterized restriction endo- 
nucleases (26,27) in conjunction with ongoing efforts to 
engineer altered specificities (22-25,28-30) will be aided by 
high-throughput methods to quantify restriction endo- 
nuclease activity instead of the methods currently avail- 
able. For example, to characterize the genome-wide 
digestion patterns of the methylation-specific restriction 
endonuclease AbaSDFI (31), genomic rat brain DNA 
was digested with AbaSDFI to map 5-hydroxymethyl- 
cytosines, the digestion products were cloned into 
plasmids and Sanger sequenced one by one to map 122 
cleavage sites to the rat genome. A similar strategy was 
used to demonstrate the relaxed specificity of the restric- 
tion enzyme TspGWI in the presence of sinefungin by 
Sanger sequencing 218 clones (5). 

High-throughput sequencing has become a valuable 
tool for analyzing DNA-protein interactions. The ability 
to experimentally pair a DNA-protein interaction to a 
sequencing event has enabled techniques such as ChlP- 
seq (32) to provide sensitive statistics on transcription 
factor-DNA binding. We use derivations of the RAD- 
seq (33) method to quantitatively measure restriction 
endonuclease activity across the sequenced Drosophila 
melanogaster and Escherichia coli genomes. This method 
specifically prepares DNA adjacent to restriction sites for 
Illumina sequencing, allowing the relative sequence counts 
of sites with different flanking nucleotides to be 
determined. The RAD-seq protocol was carried out with 
serial enzyme dilutions to identify flanking motif enrich- 
ment in enzyme-limiting reactions. Modifications were 
made to the protocol to sequence all cleavage events re- 
gardless of overhang to generate a complex profile of 
relative activities at cognate and star sites in a single 
experiment. We apply these methods to quantify the 
cleavage patterns of EcoRI and Mfel, to compare star 
activity with their engineered high-fidelity counterparts 
and to quantify the effect of flanking nucleotides on 
Mfel activity. 

MATERIALS AND METHODS 

Ah enzymes and buffers used in this study were 
contributed by New England BioLabs. 

Star activity assay 

To assay restriction enzyme activity on a genome-wide 
scale, we designed an unbiased strategy to sequence all 
digested fragments regardless of overhang (see Figure 1). 
In afl, 1000-1 500 bp fragments of E. coli strain REL606 
DNA were digested under star conditions, and smaller 
300-500 bp fragments whose decreased molecular mass 
indicated digestion were separated and Illumina 



sequenced. Both 1 000000 reads from EcoRI digests and 
650 000 reads from Mfel digests were mapped back to the 
REL606 genome, and adjacent cleavage sites were compu- 
tationally analyzed. 

(1) Generation of random 1000-1500 bp digestion tem- 
plates: 3|ag of REL606 genomic DNA were randomly 
sheared by sonication (Bioruptor). DNA fragments 
between 1000 and 1500 bp were then separated and 
purified by agarose gel electrophoresis. The DNA was 
then blunt-end repaired using Quick Blunting Kit and 3' 
adenylated using Klenow exo~. To distinguish the sheared 
DNA ends, non-divergent Illumina end 2 adapters 
composed of annealed oligonucleotides 5'-Phos-GATCG 
GAAGAGCGGTTCAGCAGGAATGCCGAGACCGA 
TCTCGTATGCCGTCTTCTGCTTG-3' and 5'-CAAGC 
AGAAGACGGCATACGAGATCGGTCTCGGCATT 
CCTGCTGAACCGCTCTTCCGATCT-3' were ligated 
to the 1000-1500 bp pool using concentrated T4 DNA 
ligase. Ten nanograms of this sample was used in a 20- 
cycle Phusion polymerase chain reaction (PGR) with the 
Illumina PE primer 2.0 (5'-CAAGCAGAAGACGGCAT 
ACGAGATCGGTCTCGGCATTCCTGCTGAACCGC 
TCTTCCGATCT-3') following Phusion product guide- 
lines to select 1000-1500 bp fragments with the Illumina 
end 2 sequence on each end. 

(2) Star condition digest: To generate a complex 
cleavage activity profile, DNA from the previous step 
was digested with an excess of restriction enzyme. Fifty- 
two nanogram of DNA was digested with 50 U of Mfel 
(GenBank accession number SRR652142) or high-fidelity 
Mfel (Mfel-HF; accession SRR652141) in a 50^1 reaction 
containing 1 x NEB4 and 5% glycerol for 24 h at 37°C. 
Thirty-two nanograms of DNA was digested with 200 U 
of EcoRI (accession SRR652140) in a 50^1 reaction con- 
taining Ix NEBl and 10% glycerol for 24 h at 37°C. 
Thirty-two nanograms of DNA was digested with 200 U 
of high-fidelity EcoRI (EcoRI-HF; accession SRR652139) 
in a 50^1 reaction containing Ix NEB4 and 10% glycerol 
for 24 h at 37°C. 

(3) Tagging of cleaved end with Illumina end 1 adapter: 
The digested DNA was blunt-end repaired using Quick 
Blunting Kit to neutralize all potential overhangs. The 
DNA was then 3' adenylated using Klenow exo~ and 
ligated using concentrated T4 DNA Ligase to barcoded 
divergent first-end Illumina adapters composed of 
annealed ohgos 5'-GGCGACCACCGAGATCTACACT 
CTTTCCCTACACGACGCTCTTCCGATCT-barcode- 
T-3' and 5'-Phos-barcode-AGATCGGAAGAGCGTCG 
TGTACTACGTT-3'. Ten nanograms of DNA from the 
ligation reaction was then used as template for an 1 8-cycle 
Phusion PGR with Illumina primers PE PGR Primer 2.0 
(5'-CAAGCAGAAGACGGCATACGAGATGGGTCT 
CGGCATTCCTGGTGAACCGCTCTTCCGATGT-3') 
and a shortened PE PGR Primer 1.0 (5'-AATGATAGGG 
GGAGGAGGGA-3') foUowing Phusion product guide- 
lines. Use of a divergent first-end adapter requires the 
paired-end primer to anneal first for amplification to 
occur. This eliminates the first end sequence on the sheared 
side. We used change in molecular mass to select for 
digested molecules. As the predigestion sample ranged 



Page 3 of 8 



Nucleic Acids Research, 2013. Vol. 41, No. 11 ell9 



RADseq 



Restriction site in genomic DNA 



Restriction digest genomic DNA 



Ligate overhang specific lllumina end 1 adaptor 



Sonically shear and isolate lllumina-size 
(approximately 500bp) fragments by gel 
electrophoresis 



Enzymatically end repair and add 3' adenosine 



Modified overhang- 
independent RADseq 



Restriction site in genomic DNA 



Sonically shear and isolate 1000 to ISOObp 
fragments by gel electrophoresis 



Enzymatically end repair and add 3' adenosine 
± A 



Ligate T-overhang lllumina end 2 adaptor and PCR 
amplify with lllumina PE PCR primer 2.0 to 
generate digestion template library 



-A- 
■T- 



Restriclion digest library 



■A 



Ligate T-overhang divergent lllumina end 2 
adaptor 



PCR amplify with lllumina PE PCR primer 1.0 
(pink arrow) and short lllumina PE PCR primer 
2.0 (gray arrow) 



Purify PCR product by gel electrophoresis and 
lllumina sequence fragments digested at cognate 
cleavage sites 



Enzymatically end repair and add 3' adenosine to 
neutralize overhangs from all possible double- 
stranded cleavage events 



Ligate T-overhang divergent lllumina end 1 adaptor 



PCR amplify with lllumina PE PCR primer 2.0 (blue 
arrow) and short lllumina PE PCR primer 1.0 (gray 
arrow) 



I 



Separate smaller digested fragments from larger 
non-digested fragments by gel electrophoresis and 
lllumina sequence fragments digested at all sites 



Figure 1. The path of a single restriction site-containing genomic locus is shown for both the RAD-seq protocol (left) and the modified overhang- 
independent RAD-seq protocol used in the star activity assay (right). 



from 1000 to ISOObp, we agarose gel-purified 300-500-bp 
PCR fragments for sequencing to assure cleavage. 

This final library exclusively contained molecules with a 
first end lllumina sequence on the cleaved side and a 



second end sequence on the sheared side. The experimen- 
tal samples were sequenced on an lllumina HiSeq 2000 to 
generate 100 bp single-end reads beginning at the cleavage 
site. The samples were separated by barcode and the reads 
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were mapped back to the E. coli genome to infer the 
cleavage site. 

Fidelity index determination 

The fidehty index (FI) was determined for EcoRI-HF and 
Mfel-HF by the standard method (3). The substrate used 
for all FI determinations was lambda DNA. 

Flanking sequence preference assay 

To determine the flanking sequence preferences of Mfel, 
D. melanogaster genomic DNA was digested in saturating 
and enzyme-limiting conditions. The DNA adjacent to the 
restriction site was then PGR amplified and lUumina 
sequenced as per the RAD-seq protocol (33) depicted in 
Figure 1. From each digest, 550000 reads were mapped to 
the Drosophila genome. Flanking sequence preference was 
inferred from motif enrichment in enzyme-limiting 
conditions. 

(1) Mfel digests: Digests were carried out in 50^1 reac- 
tions containing 786 ng of D. melanogaster strain Oregon- 
R genomic DNA, Ix NEB4, 1% glycerol and varying 
amounts of Mfel for 15niin at 37°C. A range of partial 
digest conditions was achieved by varying the amount of 
enzyme through 12 serial dilutions each, decreasing 
enzyme concentration by a factor of two as follows: 
Reaction 1 contained 10 U (GenBank accession number 
SRR652186), Rxn 2: 5U (accession SRR652187), Rxn 3: 
2.5 U (accession SRR652188), Rxn 4 : 1.25 U (accession 
SRR652189), Rxn 5: 0.63 U (accession SRR652190), Rxn 
6: 0.31 U (accession SRR652191), Rxn 7: 0.16 U (accession 
SRR652192), Rxn 8: 0.08 U (accession SRR652193), Rxn 
9: 0.04 U (accession SRR652194), Rxn 10: 0.02 U (acces- 
sion SRR652195), Rxn 11: 0.01 U (accession SRR652196), 
Rxn 12: 0.005 U (accession SRR652197). 

(2) RAD-seq library preparation: RAD-seq libraries 
were prepared according to Baird et al. (33) with the fol- 
lowing parameters. Mfel restriction site-associated DNA 
(RAD) adapters were composed of annealed ohgonucleo- 
tides of the form 5'-AATGATACGGCGACCACCGAG 
ATCTACACTCTTTCCCTACACGACGCTCTTCCGA 
TCT-barcode-3' and 5'-Phos-AATT-barcode-AGATCGG 
AAGAGCGTCGTGTAGGGAAAGAGTGTAGATCT 
CGGTGGTCGCCGTATCATT-3'. Each of the 12 ex- 
perimental digests was ligated to an Mfel RAD adapter 
with a unique barcode to allow sequencing on the same 
lllumina HiSeq 2000 lane. Before amplification, reactions 
1^ (high enzyme), 5-8 (mid enzyme) and 9-12 (low 
enzyme) were pooled to increase the sequence contribu- 
tion of the lower enzyme samples as the concentration of 
digested fragments was expected to be much greater in the 
higher enzyme samples. The final step in the RAD-seq 
hbrary preparation protocol is a PGR enrichment of 
DNA restriction fragments flanked by both sequences ne- 
cessary for lllumina sequencing. Ten nanograms of the 
high-enzyme ligation were PGR amplified using Phusion 
polymerase with PE PGR Primer 1.0 and a shortened 
PE PGR Primer 2.0 (5'-CAAGGAGAAGAGGGGAT 
AGGA-3') for 15 cycles. This was increased to 17 cycles 
with 1 5-ng ligation template for mid-enzyme libraries and 
20 cycles with 20-ng template for low-enzyme hbraries. 



Fragments averaging 550 bp were agarose gel purified 
from each reaction and sequenced on an lllumina HiSeq 
2000 to generate single-end 100-bp reads. 

Data processing 

Sequence reads were aligned to D melanogaster genome 
build 5.4.52, or E. coli genome REL606 using Novoalign 
v2.07 (Novocraft.com). Custom Perl scripts (available 
from EAJ on request) counted the sequence reads at 
each genomic location. For the flanking nucleotide 
assay, the flanking nucleotides were inferred from the 
genome reference sequence for each aligned read, and 
the total counts of reads for each flanking sequence 
tracked. For the star activity assay, the reads found for 
each recognition sequence were normalized by their count 
in the genome. 

RESULTS 

Star activity assay 

Restriction enzymes are known to digest DNA at non- 
cognate sequences called star sites. We developed a star 
activity assay for quantifying the relative activity of re- 
striction enzymes at cognate and non-cognate sites using 
genomic DNA as a substrate. The star activity assay com- 
prises shearing genomic DNA to a defined length, diges- 
tion with a restriction enzyme and selecting amplified 
fragments much smaller than the original sheared frag- 
ments for sequencing. Because the DNA fragments are 
blunted after digestion, the sequencing adapters ligate 
equally well to cognate and non-cognate sites. The full 
sequence of the digested site can be recovered after ahgn- 
ment of the sequence read back to the reference genome. 
Thus, the relative sequencing coverage of each genomic 
locus can be quantified, and the normalized sequencing 
coverage of each particular site sequence motif, repre- 
sented many times across a genome, can be determined. 

Mfel star activity 

After digestion of E. coli genomic DNA with Mfel in star 
activity conditions for 24 h, non-cognate sequences with 
single base pair changes from the cognate GAATTG 
were seen at digested sites. The bulk of non-cognate 
reads came from GAAGTG and its reverse complement 
CAGTTG, and a smaU number of additional reads were 
created by digestion of GAATTA, GAATTG, GAGTTG 
and their reverse complements TAATTG, GAATTG and 
GAAGTG (see Table 1). These star sites were also seen 
after digestion with an engineered high-specificity version 
of Mfel (Mfel-HF, NEB), although at much lower 
coverage compared with wild-type Mfel (see Table 1). 
For example, the percent of total reads for the most 
abundant star site, GAAGTG, was more than 6-fold 
higher for Mfel compared with Mfel-HF. Mfel-HF also 
showed a substantial reduction in star activity compared 
with the wild-type enzyme when FI was used as the metric. 
The FI determined in this study of Mfel-HF in NEB4 is 
>500, while the previously determined FI of wild-type 
Mfel in NEB4 is only 32 (3), demonstrating a > 16-fold 
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Table 1. Percent of reads at star sites after digestion with Mfel 



Site 


Mfel 


Mfel-HF 


CAACTG 


3.03 


0.46 


CAATTA 


0.14 


0.07 


CAATTC 


0.08 


0.07 


CAAGTG 


0.04 


0.11 



reduction in star activity on a simple substrate. While both 
of these assays demonstrate the increased fidehty of the 
engineered Mfel-HF, their results cannot be quantitatively 
compared owing to differences in substrate and reaction 
conditions. 

EcoRI star activity 

After digestion of E. coli genomic DNA with EcoRl in 
star activity conditions for 24 h, six non-cognate sequences 
with single base pair changes from the cognate GAATTC 
were seen at digested sites, although three of these made 
up a small fraction of the reads (see Table 2). These star 
sites comprised a significant portion of all sequences from 
the EcoRI digestion, with >31% of all reads coming from 
GAATTT sites, and GAAGTC and GAATTA sites 
having 4 and IVo of all reads, respectively. The sites 
GAACTC, GAATTG and GAATGC together made up 
only ~0.4% of all reads. The high fidehty version of 
EcoRI had much improved specificity in star activity con- 
ditions. The percent of total reads coming from star sites 
was 3000-fold lower in EcoRl-HF compared with EcoRI 
(see Table 2). As in the comparison of Mfel with Mfel- 
HF, the coverage difference between EcoRI and EcoRI- 
HF was less pronounced with the minor-frequency star 
sites. FI testing also showed a drastic improvement in spe- 
cificity for the engineered EcoRI-HF. The FI determined 
in this study of EcoRI-HF in NEB4 is > 16 000, while the 
previously determined FI of wild-type EcoRI in NEB4 is 
only 4 (3), demonstrating a >4000-fold reduction in star 
activity on lambda DNA. 

Flanking sequence preference of Mfel at cognate 
site CAATTG 

We also examined how the digestion of cognate sites is 
affected by the flanking nucleotide sequence. The simpler 
RAD method was used to generate short DNA tags at 
each cognate cleavage site. As in the previous assay, the 
number of tags found at each locus was used as a measure 
of digestion efficiency. By calculating a normalized 
coverage for each particular flanking sequence the influ- 
ence of these sequences on restriction enzyme activity 
could be determined. 

We digested genomic DNA from D. melanogaster with 
the restriction enzyme Mfel using enzyme concentrations 
that ranged from fully saturating to limiting (10-0.005 U). 
We reasoned that the highest enzyme concentrations 
would digest every available cognate site to near comple- 
tion, whereas enzyme preferences for particular sequences 
in the flanking nucleotides would be apparent at the 
lowest concentrations. RAD libraries were made for 
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Table 2. Percent of reads at star sites after digestion with EcoRI 



Site 


EcoRI 


EcoRI-HF 


GAATTT 


31.58 


0.01 


GAAGTC 


4.17 


0.01 


GAATTA 


2.64 


0.01 


GAACTC 


0.31 


0.01 


GAATTG 


0.05 


0.01 


GAATGC 


0.04 


0.01 



each enzyme concentration and sequenced to an average 
of ~3x coverage for all sites. The sequence reads were 
mapped to the genomic sequence and the flanking nucleo- 
tides extracted for each site. The read counts for sites or 
half sites sharing a flanking sequence were binned, and the 
average coverage calculated. 

The single nucleotide adjacent to Mfel had a strong 
effect on site preference (see Figure 2A). As the amount 
of Mfel was diluted, the sequencing reads became 
concentrated on preferred sites, creating higher coverage 
depth for preferred sites and lower coverage depth for sites 
that were digested less efficiently. If the site sequences are 
ranked by the change in sequencing coverage from the 
most enzyme to the least, the greatest increase in 
coverage is the pahndromic GCAATTGC, and the 
greatest decrease is the pahndromic ACAATTGT. In 
general, there is strong concordance in the coverage 
change for sites that are reverse complements of each 
other, as would be expected (see Table 3). All the sites 
with a 5' G base or 3' C base have an increase in 
coverage under dilute conditions, demonstrating that 
Mfel has a strong preference for these nucleotides 
adjacent to the cognate cut site. A 5' T base or 3' A 
base has a near neutral effect on coverage, and 5' A or 
C bases or 3' T or G bases have a negative effect on 
coverage, demonstrating that their presence in the 
flanking sequence makes an Mfel restriction site less 
likely to be cleaved in dilute enzyme conditions. 

This preference for certain sequences by the Mfel restric- 
tion enzyme extends beyond the single adjacent base. The 
5' G base preference becomes even more pronounced when 
the dinucleotide is 5' (A/T)G, but the 5' (G/C)G dinucleo- 
tide has a reduced sequencing coverage (see Figure 2B). The 
preference for A or T bases in the second 5' position away 
from the cut site is also true for the (A/T)T versus (G/C)T 
dinucleotides (see Figure 2C), but dinucleotides with a 5' A 
or C base adjacent to the restriction cut site have more 
complicated interactions. The TA dinucleotide has a 
strong positive effect on sequencing coverage, while (A/C/ 
G)A are all weakly to strongly negative (see Figure 2D). 
The TC dinucleotide has the lowest sequencing coverage of 
all dinucleotides, while (A/C/G)C have only a weak effect 
on coverage (see Figure 2E). Our data show the flanking 
effects of each dinucleotide to operate independently of the 
sum of its parts. 

Flanking sequence preference of Mfel at star 
site CAACTG 

The abundant non-cognate site CAACTG identified in 
wild-type Mfel star activity conditions was analyzed for 
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0.005 



0.31 0.08 
Units Mfel 



0.005 



0.005 




2 O GACMTTG A TACAATTG AACAATTG 

5 1.25 0.31 0.08 

Units Mfel 



GACMTTG^ 



0.02 



0.005 




5 1.25 

O GCCAATTG ■& 



0.31 0.08 

TCGAATTG ACCAATTG 

Units Mfel 



0.02 0.005 

V CCCAATTG 



Figure 2. Mfel activity is affected by flanking base preference. All 
graphs plot normalized sequencing coverage O'-axis) versus units of 
enzyme (x-axis). Blue circles, G base; green triangles, T base; yellow 
squares, A base; red triangles, C base. (A) Changes in sequencing 
coverage for the different bases adjacent to the Mfel half site, i.e. N- 
CAA. (B-E) Changes in sequencing coverage for the different distal 
bases of the dinucleotide adjacent to the Mfel half site, for the di- 
nucleotide NG-CAA (graph B), NT-CAA (graph C), NA-CAA 
(graph D), NC-CAA (graph E). 



flanking nucleotide preferences to compare flanking 
effects in star versus cognate activity. Tliere was a wide 
range of site sequencing coverage depending on the 
flanking nucleotide sequence of the CAACTG site. 
There was a 2.3-fold difference in coverage between sites 
with a 5' T base and the poorly cut star sites with a 5' A 
base (see Figure 3), which is of larger magnitude than the 



Table 3. The change in sequencing coverage from enzyme saturating 
to limiting conditions for each of the 16 single-nucleotide flanking 
pairs surrounding the cognate Mfel site 



Site 


Change in coverage 


GCAATTGC 


2.6 


TCAATTGC 


1.3 


GCAATTGA 


1.1 


GCAATTGG 


0.5 


CCAATTGC 


0.4 


ACAATTGC 


0.2 


GCAATTGT 


0.1 


TCAATTGA 


-0.1 


CCAATTGA 


-0.8 


ACAATTGA 


-0.8 


TCAATTGT 


-0.9 


TCAATTGG 


-0.9 


CCAATTGT 


-1.0 


ACAATTGG 


-1.2 


CCAATTGG 


-1.4 


ACAATTGT 


-1.4 




G T A 
5' base 

Figure 3. Mfel activity is affected by flanking base preference at 
CAACTG star sites. Bars represent the percentage of wild-type Mfel 
star activity assay reads mapping to CAACTG sites having a particular 
5' adjacent base, with a higher percentage indicating that adjacent base 
creates a favourable context for digestion. Because the star site is asym- 
metric, adjacent base preferences are shown for the two half sites, CAA 
(blue) and CAG (green). 



single-base flanking effects seen at the cognate site. 
Interestingly, the effect of particular flanking sequences 
differed between the cognate and star sites. The 5' G 
base was most preferred by the cognate site, whereas a 
5' T base was most preferred by the star site. The effect 
of flanking sequences also differed for the two distinct half 
sites of the CAACTG star site. Whereas pahndromic 
5' and 3' flanking sequences about the cognate Mfel 
sites confer the same effect, the distinct star half sites 
CAA and CAG respond differently. While both preferred 
a 5' T base and a 5' A base reduced sequencing coverage 
the most, the next most preferred 5' base was a C base for 
the CAA half site and a G base for the CAG half site (see 
Figure 3). Our data show that Mfel star site flanking pref- 
erences are distinct from those of cognate sites and that 
each asymmetric star half site may have distinct flanking 
preferences as wefl. 



DISCUSSION 

The power of next-generation sequencing has typically 
been applied to the characterization of the sequence or 
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function of a genome. Here we use the massively parallel 
nature of next-generation sequencing to assay the enzym- 
atic activity of restriction endonucleases that cleave both 
strands of double-stranded DNA. We developed a novel 
assay to allow the characterization of restriction enzyme 
recognition sites without any prior knowledge, and also 
used the related RAD-Seq method to assay the effect of 
flanking sequence on restriction enzyme cleavage. 

We first quantitatively assayed the activity of both 
EcoRI and Mfel and their high-fidehty counterparts 
(EcoRI-HF and Mfel-HF) by mapping cleavage events 
to the E. coli reference genome. For each enzyme, the 
majority of reads mapped to the cognate sites, 
demonstrating the correlation between cleavage efficiency 
and read count as well as highlighting the method's utility 
in de novo recognition site discovery. This unbiased detec- 
tion method also simultaneously quantified star activity 
over all DNA configurations present in the E. coli 
genome. The star activity occurred at sites with 1-bp sub- 
stitutions with respect to the cognate sites as has been 
previously observed (19). For both enzymes, only a 
subset of the possible single substitution sites produced 
sequence reads, which effectively identified those degener- 
ate sites capable of generating appreciable star activity. 
Different star sites showed a wide range of activity 
indicating the degree to which specific base changes are 
tolerated by the restriction enzyme. In the case of EcoRI, 
the three most abundant star sites in our data (GAATTT, 
GAAGTC and GAATTA) have been previously shown to 
be the three most efficiently cleaved (17,18). The high- 
fidelity restriction enzymes developed by New England 
Biolabs showed drastically reduced star activity 
compared with wild type. The assay was able to quantify 
this reduction across all potential cleavage sites and 
vahdate that no major cryptic DNA sequences are 
cleaved by the engineered high-fidelity variants. 

Massively parallel sequencing was also used to quantify 
the flanking preferences of Mfel. The relative presence of 
flanking nucleotides in sequence reads generated from the 
same complex substrate was compared across 12 enzyme 
concentrations using RAD-Seq. Drosophila genomic DNA 
was used as substrate to provide sufficient diversity of 
Mfel sites. Under enzyme saturation, an equal contribu- 
tion of reads from sites was observed regardless of 
flanking sequences. As enzyme concentration was 
decreased, flanking nucleotide preferences of progressively 
larger magnitude were observed. When reads were binned 
by a single flanking nucleotide, G-CAATTG sites were 
shown to be favourable, A-CAATTG and C-CAATTG 
were shown to be unfavourable and T-CAATTG was rela- 
tively neutral. Binning reads by flanking dinucleotides 
showed even larger effects. While the general trends seen 
when examining single flanking nucleotides were still 
apparent, the dinucleotide analysis underscored the 
unique energetic contributions to cleavage of each 
unique sequence context. This was shown in our data by 
the ability of a given nucleotide in the second position 
away from the cut site to confer either a positive or 
negative effect on cleavage depending on the identity of 
the adjacent nucleotide. For example, the thymine nucleo- 
tide in the second position away from the cut site led to 



increased cleavage for TG-CAATTG, TA-CAATTG and 
TT-CAATTG but decreased cleavage for TC-CAATTG. 

We also analyzed Mfel star activity from the first set of 
experiments with respect to flanking sequence. While the 
E. coli genome is not of sufficient complexity for exhaust- 
ive flanking sequence analysis, it does provide enough di- 
versity to confidently investigate effect of the adjacent 
base. Because digestion at the CAACTG star site was in- 
complete, we expected flanking preferences to be apparent 
much as they were in enzyme-limiting conditions at 
cognate Mfel sites. Indeed, we found that the flanking 
sequence affected CAACTG star site cleavage as well. In 
contrast to the palindromic Mfel cognate site, flanking 
preferences differed on each side of the asymmetric star 
sites. Notably, the Mfel star site flanking preferences are 
distinct from the cognate site flanking preferences, which 
is consistent with biophysical work suggesting star site- 
enzyme complexes are profoundly different from their 
cognate counterparts (2,34). 

In this article, we present new high-throughput methods 
to characterize restriction endonuclease activity. The two 
techniques hnk Illumina sequence reads to cleavage events 
of highly complex substrate provided by sequenced 
genomes to assay enzyme activity in a highly parallel 
fashion. The data acquired from their application to 
Mfel and EcoRI is consistent with previously described 
principles regarding restriction enzyme activity. These 
techniques are easily appHed to both previously 
characterized and newly discovered type II restriction 
endonucleases. Genome sequencing has yielded many 
thousands of putative restriction endonucleases (26), so 
the ability to quickly characterize their activity over all 
possible recognition sites will yield novel target 
specificities at a much higher rate than is currently pos- 
sible. Additionally, the structure-function relationship of 
restriction enzymes has been long-studied; these methods 
provide a rapid way to generate data about target specifi- 
city and activity for enzymes in altered conditions or 
altered protein structure. Thus, the methodology pre- 
sented and validated in this study will serve as a basis 
for applying the power of massively parallel analysis to 
the active and essential field of restriction enzymology. 
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